-
Notifications
You must be signed in to change notification settings - Fork 59
add autoround._generate_recipe() #758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: xinhe3 <[email protected]>
027b2c2 to
d8b831e
Compare
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
419b63c to
79323d6
Compare
|
Have you tested it on an MoE model? It might require some special handling. |
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
| combination_list = [] | ||
| numel_list = [] | ||
| loss_list = [] | ||
| for hp_layers in combinations(quantizable_layers, quantizable_num): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about creating a new file in the experimental folder and moving these there?
| clear_memory() | ||
| input_ids = to_device(input_ids, self.cache_device) | ||
| input_others = to_device(input_others, self.cache_device) | ||
| if self.recipe_mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to wrap this new code into a function and call it as early as possible.
auto_round/autoround.py
Outdated
|
|
||
| return current_input_ids, current_input_others | ||
|
|
||
| def _dump_average_bits(self, layer_config=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function cannot be used by AutoRound, since layers are converted to QuantizedLinear after quantization. If the function can correctly dump average bits in typical scenarios such as INT4, I’d prefer to keep it in the class. Otherwise, it would be better to move it elsewhere for now.
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
580ec8b to
f02331d
Compare
This code is used for INC accuracy tuning, currently only

mx_fp8is supported for mixing withmx_fp4andnv_fp4.